In a previous blog, we discussed the necessity of protecting the availability of critical healthcare services. Today, we will walk through how to design healthcare networks for high availability. To do that, we must first understand how high availability is measured. Network providers often talk about the five “9”s of high availability: 99.999%. This equates to just 5.25 minutes of downtime per year. By comparison, a network with 99.99% availability experiences only 52.5 minutes of downtime annually.
How and When to Design for 99.999% Availability
To achieve these heights of availability, network providers must eliminate as many single points of failure as possible, where a single point of failure is a part of a network that will shut down the entire system if it fails. Eliminating single points of failure can be expensive because it requires redundancy and diversity from end to end. Every point in the network, from the route processing cards in routers to the path to the Internet core, must be redundant, and that costs money.
Oftentimes, the difference between a 99.9% available network and a 99.999% available network is cost-prohibitive for healthcare organizations. A network provider’s job is then to work with the healthcare provider to design a cost-effective solution that may not be perfectly redundant from end to end but still meets the unique availability needs of a hospital or clinic. To do that, network providers must make informed decisions about design along the whole delivery chain.
This network delivery chain can be divided up into four linked components working in unison: an Internet core, middle-mile infrastructure, on-premises hardware, and associated power supplies and standbys. At each link in this chain, different single points of failure can emerge, each with their own solution. In the following, we walk through these links and identify some questions and considerations when identifying and eliminating those single points of failure.
1: The Service Provider Network Core
Depending on a Service Provider’s facilities, it will either own its own network core or may have a hybrid architecture where some of their core functionality is provided by another provider. In almost all cases, Service Providers connect (peer) with other organizations so that internet and wide area network traffic can be delivered off-network locations. This core handles Internet traffic routing and acts as a transit point for large customer-wide area networks. Regardless of whether or not a Service Provider owns all elements of its core, network engineers must consider that core’s location and the diverse paths into the facility when designing a healthcare network.
If only one path exists to the Internet core, your network will not be redundant from end to end, and it will fail when the facility fails. It’s imperative that organizations requiring 24×7 connectivity have diverse paths to the Service Provider core to protect the critical services they provide. This is usually the most highly protected portion of a Service Provider network, often containing redundant hardware in all critical systems, uninterruptible power supplies with generator backups, and redundant pathways.
2: Middle-mile Infrastructure
Unlike last-mile infrastructure, which brings connectivity directly to a customer’s facility or home, middle-mile infrastructure connects communities to the Service Provider core. Often, these connections cover vast distances and stretch across varying terrain. In extremely remote locations or harsh environments, building middle-mile infrastructure becomes a challenge in itself because of high construction costs. Government programs such as USAC’s Rural Health Care Program and the USDA’s Rural Utilities Service grants can help mitigate that cost, as can partnerships between private organizations and stakeholders.
Due to differences in landscape, environment, and cost, middle-mile infrastructure comprises a variety of transport technologies. These include fiber-optic cable, microwave, satellite, and other transport technologies. All of these technologies can be leveraged for high availability networks. Sometimes, providers use multiple technologies in one solution to ensure diversity, redundancy and resiliency. For example, network providers could propose a primary fiber service along with a backup satellite service to ensure path redundancy.
Protecting Availability of Critical Services in Rural Hospitals and Clinics
3: On-premises equipment
When measuring the availability of a network, one should measure from end to end, all the way from the Internet core to the on-premises equipment. Each healthcare provider will use its own combination of on-premises equipment, depending on their unique needs. That could include:
- Routers: By definition, routers engage in Layer 3 networking, directing traffic of packets between network devices. If the router fails, that traffic stops, so it’s useful to have more than one router at a care facility. When operating at the edge, routers can enable secure connections to a hospital or clinic’s network even from remote locations, such as a staff member’s home. Some high-end routers may have redundancy built-in, which can save buying multiple devices, but comes with the risk that there is only one router in an event where there is physical damage, such as a sprinkler system being activated near the device.
- Route processors (RPs): In networking, an RP is a card within a router that handles the control operations. When designing networks with redundancy in mind, it can be useful to install multiple route processors, so that if one fails or becomes overloaded the other can serve as backup.
- Satellite dishes: By installing on-premises satellite dishes and associated power supply, healthcare providers can enjoy direct-to-facility satellite service. This eliminates reliance on last-mile and nearby middle-mile facilities, which can greatly increase the chance that a facility will retain connectivity during a community-wide event, such as an extended power outage or widespread flooding.
4: Power Supplies and Standbys
No conversation about the redundancy of telecommunications equipment is complete without a discussion of the available power supplies. These supplies could take the form of public utilities, on-premises generators, or backup battery packs (or all of the above). For example, a hospital could leverage public power for its primary supply but maintain a backup generator in case of a failure. In this situation, it is helpful to have an automatic generator rather than one with manual processes, because it will take time for someone to manually start the generator.
Standbys are also an important factor in maximizing high availability and minimizing mean time to repair (MTTR). Wherever possible, a standby should remain active and in sync with the main system to ensure smooth failover. A “hot” standby will reduce MTTR by maintaining up-to-date software images, router configuration and network information. A “warm” standby will need to rebuild the routing table information, while a “cold” standby will need to do a full recovery. No standby could result in a prolonged total outage and is not recommended.
More than One Component
When designing a high availability network, keep in mind that the requirements needed to reach 99.999% or 99.9% cannot be met with just one component in the service delivery chain. All four links in the chain must be considered in combination, and decisions must be made at each step to balance redundancy and cost-effectiveness. If you work through these links methodically, the end result should be a hardened end-to-end delivery approach that eliminates as many single points of failure in between as possible.